.. _RapidML_API: ====================== RapidML API ====================== Getting started with RapidML is easy. All RapidML functions return an object of the ``rml`` class. **************************** The ``RapidML.rml`` Class **************************** .. autoclass:: RapidML.rml ``rml`` **class attributes** ---------------------------- ``model`` : ^^^^^^^^^^^ This is the machine learning model generated by RapidML. It has already been trained on the training data and the target that was provided by the user, either as a DataFrame or in the form of X,y arrays wherein X is training data and y is target variables. This attribute is never null. ``m_tpot`` : ^^^^^^^^^^^^ Note: This may be null depending on the type of the functions use. See function usage here. This is a TPOT object which may be a TPOTClassifier or a TPOTRegressor. RapidML uses this object to find the optimal machine learning model for the supplied data. You can use the various functions and attributes of rml.m_tpot in order to evaluate the trained model. For example: ``rml.m_tpot.score(testing_features, testing_classes)`` will allow us to evaluate our model on training data by returning an accuracy score. See the TPOT documentation for all the available functions and attributes of ``rml.m_tpot`` ``d`` : ^^^^^^^ Note: This may be null depending on the type of the functions use. See function usage_ here. This is a defaultdict containing the labels and their corresponding transformed values, should we choose to labelencode the table. See sklearn.preprocessing.LabelEncoder_ for more details. .. _sklearn.preprocessing.LabelEncoder: http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html .. _usage: RapidML%20API.html#rapidml-with-automated-machine-learning ``rml`` **class functions** --------------------------- ``put(self, mdl, d=None)`` : ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is a method used by the RapidML functions for assignment of attributes of rml objects. Here ``mdl`` can either be the model supplied by the user or supplied by RapidML via TPOT. If ``mdl`` is a TPOT object then the ``model`` attribute is ``mdl.fitted_pipeline_`` (the best pipeline found with TPOT for the training data) and the ``m_tpot`` attribute is a TPOT object. However if ``mdl`` is a fitted (trained) machine learning model then the ``model`` attribute will be mdl and the ``m_tpot`` attribute will be null. If we decide to labelencode the training data, then the ``d`` attribute will be the *d* supplied as the function argument. Otherwise, the ``d`` attribute will be null. ``le(self, df)`` : ^^^^^^^^^^^^^^^^^^ This function may be called by the user from an ``rml`` object, in order to perform label encoding on another dataset, using the same encoding table used on a previous similar dataset. For example, if we wish to perform the same transformation of labels on two DataFrames with same types of columns but different rows, then we first labelencode the first table, and then use this function to labelencode the next table. ******************************************** ``RapidML.rapid_classifier`` ******************************************** .. autofunction:: RapidML.rapid_classifier(df, le = 'Yes', model = TPOTClassifier(generations=5, population_size=50, verbosity=2), name='RapidML_files') The ``rapid_classifier`` performs label encoding on the input DataFrame ``df`` (which are the features), depending on the user's input. It then uses a TPOT backend to perform an intelligent search to find and optimize the best classifier in accordance with the input data. Finally, it populates an ``rml`` object's attributes and returns this object. Parameters ---------- ``df`` ^^^^^^^^^^ Type: ``pandas.DataFrame`` This is the input DataFrame provided by the users as the training features as well as the initial columns and the target as the last column on the DataFrame. ``le`` ^^^^^^^^ Type: ``str`` The default value is ``'Yes'``. If ``le`` is ``'Yes'``, then RapidML will labelencode the input DataFrame supplied as ``df``, and store the ``LabelEncoder`` in a ``defaultdict``. Or, if ``le`` is ``'No'`` then LabelEncoding will not be done. For any other value of ``le``, a value error will be raised. ``model`` ^^^^^^^^^^^ Type: ``tpot.TPOTClassifier`` The default value is ``tpot.TPOTClassifier(generations=5, population_size=50, verbosity=2)``. This is a TPOTClassifier object. You can pass a TPOTClassifier object with different parameter configurations as per your requirement. In general, increasing the ``generations`` and ``population_size`` increases the model's accuracy. See TPOTClassifier_ for more details. .. _TPOTClassifier: http://epistasislab.github.io/tpot/api/#classification ``name`` ^^^^^^^^^^ Type: ``str`` Default value is ``"RapidML_Files"``. The value of the string is the name of the directory in which RapidML creates for storing the machine learning models, LabelEncoder dictionary, skeletal input DataFrame, a datatype list and a dummy user input as serialized ``dill`` files, as well as the ``API.py and ``helper.py`` scripts. This directory is to be uploaded to a web-server, in order to serve (use) the model generated by RapidML for making predictions via the web API. Returns -------- Returns a ``rml`` object. If ``le`` is ``'Yes'`` then ``rml.d`` is populated, otherwise, it is null. ``rml.model`` and ``rml.m_tpot`` are always populated, when using ``rapid_classifier``. Files Created ------------- ``model`` ^^^^^^^^^ This is the Machine Learning model generated by RapidML which is saved after being serialized via ``dill``. ``d`` ^^^^^ This is the ``DefaultDict`` (like ``dict``) containing the LabelEncoder used to encode the labels in the DataFrame. It has been saved after serialization via ``dill``. ``df`` ^^^^^^ This is the skeletal DataFrame, which contains only headers and no data. It has been saved after serialization via ``dill``. ``dt`` ^^^^^^ This is a list containing the data types of the columns in the input ``DataFrame``. It has been saved after serialization via ``dill``. ``f`` ^^^^^ This is a string containing a dummy input value and can be fed to the API as an URL argument. It is the second row of the input ``DataFrame``, converted to a string. It has been saved after serialization via ``dill``. ``API.py`` ^^^^^^^^^^ This is the actual Flask-API_ used by the server for accepting user inputs, making predictions on the basis of the inputs and returning the predictions. .. _Flask-API: http://flask.pocoo.org/ ``helper.py`` ^^^^^^^^^^^^^ This is a helper module used by ``API.py`` and performs the actual predictions using the RapidML generated model. ****************************** ``RapidML.rapid_regressor`` ****************************** .. autofunction:: RapidML.rapid_regressor(df,le = 'No', model = TPOTRegressor(generations=5, population_size=50, verbosity=2), name='RapidML_files') The ``rapid_regressor`` performs label encoding on the input DataFrame ``df`` (which are the features), depending on the user's input. It then uses a TPOT backend to perform an intelligent search to find and optimize the best regressor in accordance with the input data. Finally, it populates an ``rml`` object's attributes and returns this object. Parameters ---------- ``df`` ^^^^^^^^^^ Type: ``pandas.DataFrame`` This is the input DataFrame provided by the users as the training features as well as the initial columns and the target as the last column on the DataFrame. ``le`` ^^^^^^^^ Type: ``str`` The default value is ``'No'``. If ``le`` is ``'Yes'``, then RapidML will labelencode the input DataFrame supplied as ``df``, and store the ``LabelEncoder`` in a ``defaultdict``. Or, if ``le`` is ``'No'`` then LabelEncoding will not be done. For any other value of ``le``, a value error will be raised. ``model`` ^^^^^^^^^^^ Type: ``tpot.TPOTRegressor`` The default value is ``tpot.TPOTRegressor(generations=5, population_size=50, verbosity=2)``. This is a TPOTRegressor object. You can pass a TPOTRegressor object with different parameter configurations as per your requirement. In general, increasing the ``generations`` and ``population_size`` increases the model's accuracy. See TPOTRegressor_ for more details. .. _TPOTRegressor: http://epistasislab.github.io/tpot/api/#regression ``name`` ^^^^^^^^^^ Type: ``str`` Default value is ``"RapidML_Files"``. The value of the string is the name of the directory in which RapidML creates for storing the machine learning models, LabelEncoder dictionary, skeletal input DataFrame, a datatype list and a dummy user input as serialized ``dill`` files, as well as the ``API.py and ``helper.py`` scripts. This directory is to be uploaded to a web-server, in order to serve (use) the model generated by RapidML for making predictions via the web API. Returns -------- Returns a ``rml`` object. If ``le`` is ``'Yes'`` then ``rml.d`` is populated, otherwise, it is null. ``rml.model`` and ``rml.m_tpot`` are always populated, when using ``rapid_regressor``. Files Created ------------- ``model`` ^^^^^^^^^ This is the Machine Learning model generated by RapidML which is saved after being serialized via ``Dill``. ``d`` ^^^^^ This is the ``DefaultDict`` (like ``dict``) containing the LabelEncoder used to encode the labels in the DataFrame. It has been saved after serialization via ``Dill``. ``df`` ^^^^^^ This is the skeletal DataFrame, which contains only headers and no data. It has been saved after serialization via ``Dill``. ``dt`` ^^^^^^ This is a list containing the data types of the columns in the input ``DataFrame``. It has been saved after serialization via ``Dill``. ``f`` ^^^^^ This is a string containing a dummy input value and can be fed to the API as an URL argument. It is the second row of the input ``DataFrame``, converted to a string. It has been saved after serialization via ``Dill``. ``API.py`` ^^^^^^^^^^ This is the actual Flask-API_ used by the server for accepting user inputs, making predictions on the basis of the inputs and returning the predictions. .. _Flask-API: http://flask.pocoo.org/ ``helper.py`` ^^^^^^^^^^^^^ This is a helper module used by ``API.py`` and performs the actual predictions using the RapidML generated model. ******************************************** ``RapidML.rapid_classifier_arr`` ******************************************** .. autofunction:: RapidML.rapid_classifier_arr(X, Y, model = TPOTClassifier(generations=5, population_size=50, verbosity=2), name='RapidML_files') The ``rapid_classifier_arr`` function is similar to the ``rapid_classifier``, except rather receiving the features and targets as a single input DataFrame, the function receives the features as X (type numpy.array), and the target as Y (type ``numpy.array``). Another important point of difference is that this function doesn't perform label encoding. Parameters ---------- ``X`` ^^^^^ Type: ``numpy.array`` or array-like This is the input data. ``Y`` ^^^^^ Type: ``numpy.array`` or array-like This is the target. ``model`` ^^^^^^^^^ Type: ``tpot.TPOTClassifier`` Default value is ``TPOTClassifier(generations=5, population_size=50, verbosity=2)``. This is a TPOTClassifier object. You can pass a TPOTClassifier object with different parameter configurations as per your requirement. In general, increasing the ``generations`` and ``population_size`` increases the model's accuracy. See the TPOTClassifier_ for more details. ``name`` ^^^^^^^^ Type: ``str`` Default value is ``"RapidML_Files"``. The value of the string is the name of the directory in which RapidML creates for storing the machine learning models, LabelEncoder dictionary, skeletal input DataFrame, a datatype list and a dummy user input as serialized ``Dill`` files, as well as the ``API.py`` and ``helper.py`` scripts. This directory is to be uploaded to a web-server, in order to serve (use) the model generated by RapidML for making predictions via the internet. Returns ------- Returns a ``rml`` object. ``rml.d`` is always null. ``rml.model`` and ``rml.m_tpot`` are always populated. Files Created ------------- ``model`` ^^^^^^^^^ This is the Machine Learning model generated by RapidML which is saved after being serialized via ``Dill``. ``API.py`` ^^^^^^^^^^ This is the actual ``Flask`` API used by the server for accepting user inputs, making predictions on the basis of the inputs and returning the predictions. ******************************************** ``RapidML.rapid_regressor_arr`` ******************************************** .. autofunction:: RapidML.rapid_regressor_arr(X, Y, model = TPOTRegressor(generations=5, population_size=50, verbosity=2), name='RapidML_files') The ``rapid_regressor_arr`` function is similar to the ``rapid_regressor``, except rather receiving the features and targets as a single input DataFrame, the function receives the features as X (type numpy.array), and the target as Y (type ``numpy.array``). Another important point of difference is that this function doesn't perform label encoding. Parameters ---------- ``X`` ^^^^^ Type: ``numpy.array`` or array-like This is the input data. ``Y`` ^^^^^ Type: ``numpy.array`` or array-like This is the target. ``model`` ^^^^^^^^^ Type: ``tpot.TPOTRegressor`` Default value is ``TPOTRegressor(generations=5, population_size=50, verbosity=2)``. This is a TPOTRegressor object. You can pass a TPOTRegressor object with different parameter configurations as per your requirement. In general, increasing the ``generations`` and ``population_size`` increases the model's accuracy. See the TPOTRegressor_ for more details. ``name`` ^^^^^^^^ Type: ``str`` Default value is ``"RapidML_Files"``. The value of the string is the name of the directory in which RapidML creates for storing the machine learning models, LabelEncoder dictionary, skeletal input DataFrame, a datatype list and a dummy user input as serialized ``Dill`` files, as well as the ``API.py`` and ``helper.py`` scripts. This directory is to be uploaded to a web-server, in order to serve (use) the model generated by RapidML for making predictions via the internet. Returns ------- Returns a ``rml`` object. ``rml.d`` is always null. ``rml.model`` and ``rml.m_tpot`` are always populated. Files Created ------------- ``model`` ^^^^^^^^^ This is the Machine Learning model generated by RapidML which is saved after being serialized via ``Dill``. ``API.py`` ^^^^^^^^^^ This is the actual ``Flask`` API used by the server for accepting user inputs, making predictions on the basis of the inputs and returning the predictions. ******************************************** ``RapidML.rapid_udm`` ******************************************** .. autofunction:: RapidML.rapid_udm(df, model, le = 'No', name='RapidML_files') This allows RapidML to be a versatile model in the hands of experienced Data Scientists and developers. It works similarly to the ``rapid_regressor`` or the ``rapid_classifier`` wherein a single DataFrame is passed which contains the input data as well as the target (which is the last column). However, it allows the user to provide a ``sklearn`` model of their choice. Depending on the user's choice, label encoding is done or ignored. The model that is supplied is then fitted (trained) on the input data and then stored, by populating the ``rml.model`` attribute. Parameters ---------- ``df`` ^^^^^^^^^^ Type: ``pandas.DataFrame`` This is the input DataFrame provided by the users as the training features as well as the initial columns and the target as the last column on the DataFrame. ``model`` ^^^^^^^^^^^ Type: ``sklearn`` model This may be any model which supports the syntax ``sklearn.model.fit(X,y)`` where X is input data and y is target. ``le`` ^^^^^^^^ Type: ``str`` The default value is ``'Yes'``. If ``le`` is ``'Yes'``, then RapidML will labelencode the input DataFrame supplied as ``df``, and store the ``LabelEncoder`` in a ``defaultdict``. Or, if ``le`` is ``'No'`` then LabelEncoding will not be done. For any other value of ``le``, a value error will be raised. ``name`` ^^^^^^^^^^ Type: ``str`` Default value is ``"RapidML_Files"``. The value of the string is the name of the directory in which RapidML creates for storing the machine learning models, LabelEncoder dictionary, skeletal input DataFrame, a datatype list and a dummy user input as serialized ``dill`` files, as well as the ``API.py and ``helper.py`` scripts. This directory is to be uploaded to a web-server, in order to serve (use) the model generated by RapidML for making predictions via the web API. Returns -------- Returns a ``rml`` object. If ``le`` is ``'Yes'`` then ``rml.d`` is populated, otherwise, it is null. ``rml.model`` is always populated, while ``rml.m_tpot`` is always empty. Files Created ------------- ``model`` ^^^^^^^^^ This is the Machine Learning model generated by RapidML which is saved after being serialized via ``dill``. ``d`` ^^^^^ This is the ``DefaultDict`` (like ``dict``) containing the LabelEncoder used to encode the labels in the DataFrame. It has been saved after serialization via ``dill``. ``df`` ^^^^^^ This is the skeletal DataFrame, which contains only headers and no data. It has been saved after serialization via ``dill``. ``dt`` ^^^^^^ This is a list containing the data types of the columns in the input ``DataFrame``. It has been saved after serialization via ``dill``. ``f`` ^^^^^ This is a string containing a dummy input value and can be fed to the API as an URL argument. It is the second row of the input ``DataFrame``, converted to a string. It has been saved after serialization via ``dill``. ``API.py`` ^^^^^^^^^^ This is the actual Flask-API_ used by the server for accepting user inputs, making predictions on the basis of the inputs and returning the predictions. .. _Flask-API: http://flask.pocoo.org/ ``helper.py`` ^^^^^^^^^^^^^ This is a helper module used by ``API.py`` and performs the actual predictions using the RapidML generated model. ******************************************** ``RapidML.rapid_udm_arr`` ******************************************** .. autofunction:: RapidML.rapid_udm_arr(X, Y, model, name='RapidML_files') The ``rapid_udm _arr`` function is similar to the ``rapid_udm``, except rather receiving the features and targets as a single input DataFrame, the function receives the features as X (type numpy.array), and the target as Y (type ``numpy.array``). Another important point of difference is that this function doesn't perform label encoding. Parameters ---------- ``X`` ^^^^^ Type: ``numpy.array`` or array-like This is the input data. ``Y`` ^^^^^ Type: ``numpy.array`` or array-like This is the target. ``model`` ^^^^^^^^^ Type: ``sklearn`` model This may be any model which supports the syntax ``sklearn.model.fit(X,y)`` where X is input data and y is target. ``name`` ^^^^^^^^ Type: ``str`` Default value is ``"RapidML_Files"``. The value of the string is the name of the directory in which RapidML creates for storing the machine learning models, LabelEncoder dictionary, skeletal input DataFrame, a datatype list and a dummy user input as serialized ``Dill`` files, as well as the ``API.py`` and ``helper.py`` scripts. This directory is to be uploaded to a web-server, in order to serve (use) the model generated by RapidML for making predictions via the internet. Returns ------- Returns a ``rml`` object. ``rml.model`` is always populated. ``rml.d`` and ``rml.m_tpot`` are always null. Files Created ------------- ``model`` ^^^^^^^^^ This is the Machine Learning model generated by RapidML which is saved after being serialized via ``Dill``. ``API.py`` ^^^^^^^^^^ This is the actual ``Flask`` API used by the server for accepting user inputs, making predictions on the basis of the inputs and returning the predictions.